Towards the Detection of Cross-Language Source Code Reuse

نویسندگان

  • Enrique Flores
  • Alberto Barrón-Cedeño
  • Paolo Rosso
  • Lidia Moreno
چکیده

Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dwarf Frankenstein is still in your memory: tiny code reuse attacks

Code reuse attacks such as return oriented programming and jump oriented programming are the most popular exploitation methods among attackers. A large number of practical and non-practical defenses are proposed that differ in their overhead, the source code requirement, detection rate and implementation dependencies. However, a usual aspect among these methods is consideration of the common be...

متن کامل

(CLSCR) Cross Language Source Code Reuse Detection Using Intermediate Language

In today's digital era information access is just a click away. so computer science students also have easy access to all the source codes from different websites thus it has become difficult for academicians to detect source code reuse in students programming assignments. The new trend in the area of source code reuse is using the source code by translating it in another programming language p...

متن کامل

Normalization based Stop-Word approach to Source Code Plagiarism Detection

This paper is a report of PES Institute of Technology’s participation in the Cross Language Detection of Source Code Reuse (CL-SOCO) task at FIRE 2015 [1]. We approach this task as text document plagiarism task, without considering formal programming language grammatical structure. We use normalization of commonly used identifiers to detect pair of programs which have the same objective. We als...

متن کامل

Mainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao

As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...

متن کامل

Source Code Reuse Analysis in Multiple Projects based on the Clone Genealogy

In the software industry and OSS projects, it is said that source code reuse could improve productivity and reliability of software development, and reduce development time. On the other hand, source code reuse requires professional skills to developers. Ad-hoc reuse might introduce some maintenance problems. The source code reuse analysis for software development organizations is worthy to be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011